Genome-Resolved Meta-Analysis of the Microbiome in Oil Reservoirs Worldwide

Microorganisms inhabiting subsurface petroleum reservoirs are key players in biochemical transformations. The interactions of microbial communities in these environments are highly complex and still poorly understood. This work aimed to assess publicly available metagenomes from oil reservoirs and implement a robust pipeline of genome-resolved metagenomics to decipher metabolic and taxonomic profiles of petroleum reservoirs worldwide. Analysis of 301.2 Gb of metagenomic information derived from heavily flooded petroleum reservoirs in China and Alaska to non-flooded petroleum reservoirs in Brazil enabled us to reconstruct 148 metagenome-assembled genomes (MAGs) of high and medium quality. At the phylum level, 74% of MAGs belonged to bacteria and 26% to archaea. The profiles of these MAGs were related to the physicochemical parameters and recovery management applied. The analysis of the potential functional core in the reservoirs showed that the microbiota was specialized for each site, with 31.7% of the total KEGG orthologies annotated as functions (1690 genes) common to all oil fields, while 18% of the functions were site-specific, i.e., present only in one of the oil fields. The oil reservoirs with a lower level of intervention were the most similar to the potential functional core, while the oil fields with a long history of water injection had greater variation in functional profile. These results show how key microorganisms and their functions respond to the distinct physicochemical parameters and interventions of the oil field operations such as water injection and expand the knowledge of biogeochemical transformations in these ecosystems.


Introduction
Oil reservoirs are unique subsurface environments often located deep in the earth, characterized by a complex matrix of hydrocarbons, low oxygen levels, and a long history of separation from the surface [1]. Currently, while a plethora of bacteria and archaea have been detected in petroleum reservoirs, some patterns in microbial taxonomic composition have been identified that correlate to specific environmental factors such as temperature and depth, with little role for the geographic distance between petroleum reservoirs worldwide [2,3]. Shallow and low-temperature reservoirs show an abundance of Epsilonbacteria and Deltaproteobacteria members, while deeper high-temperature reservoirs are enriched in Clostridiales and Thermotogales. Despite these specific taxa that distinguish oil reservoirs according to their attributes, a core microbiome composition across the world has been identified, showing that the bacterial classes Gammaproteobacteria, Clostridia, and Bacteroidia and the archaeal class Methanomicrobia are ubiquitous in petroleum reservoirs [4].  All downloaded metagenome datasets were submitted to the binning pipeline approach established in this work ( Figure 2) to recover MAGs. The study performed by Liu et al. [11] described the metabolic potential and activity of the microbial communities obtained from the Jiangsu Oil Reservoir (China). They sampled produced water from three different wells, named W2-71, W9-18, and W15-5, and sequenced in Illumina MiSeq platform (2 × 75 bp). Hu et al. [6] used metagenomic shotgun sequencing of six samples from two North Slope oil fields in Alaska to compare the microbial community across the range of physical and chemical conditions and to predict metabolic roles. The produced water samples were obtained from different depths and temperatures, with or without hydrogen sulfide production (souring), and with or without impact by seawater injection. Samples SB1 and SB2 were from Schrader Bluff formation, with a temperature range of 24-27 • C and depths from 1200 to 1400 m. Samples K2 and K3 were from the Kuparuk formation, with a temperature range between 47 and 70 • C and depths between 1785 and 2150 m. And finally, samples I1 and I2 were collected from the Ivishak Formation, the hottest and deepest reservoir of the study, with a temperature range between 80 and 83 • C and depths of 2750 to 3100 m. In the study of Wang et al. [1], the community structure and metabolic potential of oil-water mixture samples collected from the water-flooded Shen84 oil reservoir (Liaohe Oil Field, in China) was investigated. Wells 6111_O and 6111_W showed medium permeability, medium oil saturation, and active edge water, well AJ-5 showed high permeability, high soil saturation, and intensive anthropogenic perturbation, and well 71551 showed extremely low permeability and low oil saturation. Song et al. [12] studied a water-flooded petroleum reservoir at the Shengli Oil Field, located in China. They Microorganisms 2021, 9,1812 4 of 21 sampled produced fluids from the wellhead of the production well to assess the taxonomic and functional information of the subsurface microbial community. In the study performed by Nie et al. [2], crude oil metagenomes from two oil reservoirs in Daqing and Qinghai Oil Fields in China were analyzed. The oil samples were collected from the wellhead of the high-temperature Qinghai (QH) Oil Field and the wellhead of the mesophilic Daqing Oil Field (DQ), to compare the microbiome taxonomic and functional profiles. Finally, Sierra-Garcia et al. [3] sequenced the metagenome from a crude oil sample collected from a non-water flooded and high-temperature Brazilian oil reservoir located at Miranga Oil Field in Brazil (BA-1) to assess the functional genetic potential of the microbial community. Due to the differences in the stage of recovery management and temperatures among the wells, five groups of samples were defined: (i) without water injection and high temperature; (ii) with water injection and high temperature; (iii) seawater injection and high temperature; (iv) water injection and mesophilic temperatures; (v) recycled water injection.
Microorganisms 2021, 9, x FOR PEER REVIEW 4 of 22 a water-flooded petroleum reservoir at the Shengli Oil Field, located in China. They sampled produced fluids from the wellhead of the production well to assess the taxonomic and functional information of the subsurface microbial community. In the study performed by Nie et al. [2], crude oil metagenomes from two oil reservoirs in Daqing and Qinghai Oil Fields in China were analyzed. The oil samples were collected from the wellhead of the high-temperature Qinghai (QH) Oil Field and the wellhead of the mesophilic Daqing Oil Field (DQ), to compare the microbiome taxonomic and functional profiles. Finally, Sierra-Garcia et al. [3] sequenced the metagenome from a crude oil sample collected from a non-water flooded and high-temperature Brazilian oil reservoir located at Miranga Oil Field in Brazil (BA-1) to assess the functional genetic potential of the microbial community. Due to the differences in the stage of recovery management and temperatures among the wells, five groups of samples were defined: (i) without water injection and high temperature; (ii) with water injection and high temperature; (iii) seawater injection and high temperature; (iv) water injection and mesophilic temperatures; (v) recycled water injection.

Quality Control and Filtering
In this study, we used metagenomic sequencing datasets previously obtained from oil reservoirs worldwide to recover MAGs. Sequence datasets were downloaded from databases and processed according to Figure 2. Briefly, quality control of sequencing reads was performed using FASTQ v.0.11.5 [14]. Adapter presence was detected using BBMap v38.90 [15]. Trimmomatic v. 0.36 [16] was used to remove the adapters and to trim reads with quality lower than 30 (Phred score) and length smaller than 100 bp. For some metagenomes with low quality and read lengths of 100 bp, Phred scores 20 and 75 bp minimum length were used as the thresholds.

Metagenome Assembly and Binning
Mash software was used to calculate the pairwise distances between the datasets by employing the MinHash technique [17] (Figure 2), to know which samples could be co-

Quality Control and Filtering
In this study, we used metagenomic sequencing datasets previously obtained from oil reservoirs worldwide to recover MAGs. Sequence datasets were downloaded from databases and processed according to Figure 2. Briefly, quality control of sequencing reads was performed using FASTQ v.0.11.5 [14]. Adapter presence was detected using BBMap v38.90 [15]. Trimmomatic v. 0.36 [16] was used to remove the adapters and to trim reads with quality lower than 30 (Phred score) and length smaller than 100 bp. For some metagenomes with low quality and read lengths of 100 bp, Phred scores 20 and 75 bp minimum length were used as the thresholds.

Metagenome Assembly and Binning
Mash software was used to calculate the pairwise distances between the datasets by employing the MinHash technique [17] (Figure 2), to know which samples could be co-assembled. Metagenome datasets with distance values below 0.1 were co-assembled. Both the trimmed read pairs and the reads left unpaired after quality filtering were used for metagenomic assembly. Assembly and co-assembly modes were performed using Megahit v.1.1.2 [18] with the following parameters: '-k-min 27 -k-max 147 -k-step 10 -mincontig-len 500'. The quality of the generated assemblies was assessed using MetaQUAST v5.0.2 [19]. The coverage profile of the contigs was obtained by using Bowtie2 v2.3.5.1 [20] read alignment program to map the reads of each sample to each assembly. SAM files generated were translated to BAM files, then sorted and indexed using SAMtools v1.9 [21].
MAGs quality was defined according to the MIMAG standard [Completeness-(5* Contamination)] [30]. MAGs with quality scores >50 were used in downstream analyses. High-quality MAGs are defined as completeness >90% and contamination <5% and medium-quality MAGs are defined as completeness >50% and contamination <10%. The mapping rate was estimated using Bowtie2 v1.3.0 [20] to map the reads of each metagenome to the MAGs originated from it; the proportion of reads that mapped with at least 95% identity was obtained with CoverM v0.6.1 (https://github.com/wwood/CoverM, accessed on 5 May 2021) in genome mode.

Taxonomic Assignment and Phylogeny
Taxonomic affiliation of MAGs was carried out using Genome Taxonomy Database and the associated GTDB-Tk toolkit v1.1.0 [31][32][33] and the RefSeq Release 89. GTDB-Tk, which also uses pplacer as a third-party software [34], uses average nucleotide identity (ANI) and genome topology to find the closest genomic relative in its database of >140,000 public prokaryote genomes. A phylogenetic approach was used to place the obtained MAGs in the tree of Bacteria and Archaea. PhylophlAn v3.0 [35] was used to place the genomes in a high-resolution phylogeny, and iTOL [36] was used for the visualization using the Newick file as input.

Functional Assignment
Genes encoded in MAGs were predicted using Prodigal v.2.6.3 [37] and annotated using sequence aligner Diamond v0.9.30 [38] based on the KEGG database (Kyoto Encyclopedia of Genes and Genomes [39]. After annotation, genes found were filtered based on a list of specific genes related to hydrocarbon metabolism (Supplementary Table S2). The completeness of each pathway was calculated based on the fraction of genes present in KEGG Modules.

Environment-Specific Orthogroups
Proteins annotated in the oil reservoir MAGs were compared to related organisms to assess functional novelty related to oil reservoir environments. First, MAGs with taxonomic affiliation at least the family level were identified and grouped. Then, protein sequences from species of the same families of the oil reservoir MAGs set were recovered from GTDB (release 89). Only families with at least four genomes were used for this analysis (including the MAGs generated in this study). All-versus-all pairwise comparisons between proteins in each family were performed using Orthofinder v2.5.2 [40], clustering the proteins into orthogroups (orthologous gene cluster), and generating count matrices with the number of genes within each orthogroup per genome. To identify orthogroups that are exclusive to the oil reservoir MAGs (environment-specific orthogroups), tspex v0.6.2 [41] was used to calculate the specificity measure (SPM) of each orthogroup from the matrices. Only orthogroups with at least three genes were used. If any orthogroup was classified as exclusive to the oil reservoir MAGs, the percentage of enrichment in all orthogroups in each family was calculated comparing with the representative genome from the GTDB. The orthogroups with at least 60% of enrichment were selected and functionally annotated using Diamond v. [38] against KEGG [42] and EggNogg [43] databases. Orthogroups with ambiguous functions (different annotations within the orthogroup) were ignored.

Functional Core Analysis
Annotation of a functional core shared among all oil reservoir datasets analyzed was performed based on KEGG databases. Gene annotations from each oil field were pooled and compared to assess both exclusive and common KEGG categories among sites. The percentage of orthologies in each group was calculated. The relative abundance of the functional core among all sites and of the exclusive functions based on Level 2 categories from KEGG orthologies were assessed.

Co-Assembling Statistics
After filtering low-quality reads, metagenomic datasets were processed by grouping metagenomes by read-level similarity (Minhash distances), to determine which of the metagenome sets could be co-assembled. The resultant distance matrix was plotted in a clustering heatmap (Supplementary Figure S1). Datasets with distances below 0.1 were co-assembled. Four co-assemblies and six individual assemblies were performed. In all cases, the co-assemblies corresponded to samples of the same study. Supplementary Table  S3 summarizes the statistics of the assemblies.

Binning
The individual assemblies and co-assemblies were binned to MAGs. Merging of bins obtained using the pipeline described herein (Figure 2) yielded 176 MAGs. Quality evaluation results indicated that 74 of MAGs had estimated completeness higher than 50% and contamination less than 10% and were classified as medium quality, while 74 MAGs had completeness higher than 90% and contamination less than 5% and were classified as high-quality MAG, according to the MiMAG standard [30] (Supplementary Figure S2). MAGs with low-quality were not used for further analyses.

Taxonomic Affiliation and Phylogeny
For the taxonomic assignment of MAGs, a phylogenetic tree was reconstructed using the 148 MAGs obtained in this study ( Figure 3). One hundred and ten MAGs belonged to the Bacteria domain while 38 belonged to Archaea. The MAGs were grouped into 25 bacterial phyla and four archaeal phyla. The tree showed three large clusters representing the phyla Firmicutes (21 MAGs), Proteobacteria (19 MAGs), and Halobacterota (24 MAGs) ( Figure 3). Smaller clades represented the Campylobacterota, Deferribacterota, Nitrospira, Desulfobacterota, Thermotoga, Euryarchaeota, among others. The most representative family was Methanotrichaceae with nine MAGs, followed by Pseudomonadaceae with seven MAGs, Thermovirgaceae with six MAGs, and Archaeoglobaceae with four MAGs. At the genus level, MAGs were assigned to 94 different genera. The most represented ones were the methanogenic archaeon Methanothrix, with five MAGs, and the proteobacterium Pseudomonas, also with five MAGs, followed by Thermodesulfobacterium (4) and Thermodesulfovibrio (3). The taxonomic assignment of all MAGs is detailed in Supplementary Table S4.
MAGs, Thermovirgaceae with six MAGs, and Archaeoglobaceae with four MAGs. At the genus level, MAGs were assigned to 94 different genera. The most represented ones were the methanogenic archaeon Methanothrix, with five MAGs, and the proteobacterium Pseudomonas, also with five MAGs, followed by Thermodesulfobacterium (4) and Thermodesulfovibrio (3). The taxonomic assignment of all MAGs is detailed in Supplementary Table S4. The first group of samples (without water injection plus high temperature) was composed only by BA-1 sample from Miranga Oil Field, the solely well that had not been water-flooded among the samples studied here. This well may reflect, to some degree, the indigenous thermophilic microbial communities in oil reservoirs before flooding operations. Sixteen MAGs were recovered from this sample, of which 11 MAGs were distributed in three archaeal phyla and five MAGs in four bacterial phyla ( Figure 3). The Archaeal phylum Halobacterota was represented by 50% of the MAGs. The bacterial MAGs were assigned to the phyla Synergistota, Firmicutes, Patescibacteria, and Caldatribacteriota. At the genus level, 13 out of 16 MAGs could not be assigned to any taxa, suggesting that the reservoir harbors several unknown taxa. Two MAGs, including one affiliated to the Microgenomatia class and another affiliated to the Tissierellaceae family were found exclusively in this reservoir. The class Microgenomatia belongs to the Patescibacteria, recently proposed as a phylum [33]. The majority of taxa from this phylum have been proposed through the Metagenome-Assembled Genome approach of difficult access environments, such as groundwater, deep-sea sediments and, deep subsurface [44][45][46][47][48][49]. In addition, three of the Microgenomatia genomes were found in activated sludge from a petrochemical plant [50], and one MAG belonging to Patescibacteria was found in crude oil metagenome from an oil field in Wietze, Germany [51]. Microbes from this phylum are characterized by small cell and genome sizes [52], containing only essential genes and lacking many metabolic functions like amino acid or nucleotide biosynthesis [44]. Due to these special characteristics, a parasitic lifestyle has been suggested [8,53]. Two MAGs from the Synergistales order were found in the BA-1 sample from Miranga Oil Field. One of them was affiliated with the Thermovirgaceae family. Members of this family were found in a hydrocarbon reservoir that was submitted to a CO2 enhanced oil recovery flood 40 years ago [54]. In terms of the archaeal community, this oil reservoir metagenome was The first group of samples (without water injection plus high temperature) was composed only by BA-1 sample from Miranga Oil Field, the solely well that had not been water-flooded among the samples studied here. This well may reflect, to some degree, the indigenous thermophilic microbial communities in oil reservoirs before flooding operations. Sixteen MAGs were recovered from this sample, of which 11 MAGs were distributed in three archaeal phyla and five MAGs in four bacterial phyla ( Figure 3). The Archaeal phylum Halobacterota was represented by 50% of the MAGs. The bacterial MAGs were assigned to the phyla Synergistota, Firmicutes, Patescibacteria, and Caldatribacteriota. At the genus level, 13 out of 16 MAGs could not be assigned to any taxa, suggesting that the reservoir harbors several unknown taxa. Two MAGs, including one affiliated to the Microgenomatia class and another affiliated to the Tissierellaceae family were found exclusively in this reservoir. The class Microgenomatia belongs to the Patescibacteria, recently proposed as a phylum [33]. The majority of taxa from this phylum have been proposed through the Metagenome-Assembled Genome approach of difficult access environments, such as groundwater, deep-sea sediments and, deep subsurface [44][45][46][47][48][49]. In addition, three of the Microgenomatia genomes were found in activated sludge from a petrochemical plant [50], and one MAG belonging to Patescibacteria was found in crude oil metagenome from an oil field in Wietze, Germany [51]. Microbes from this phylum are characterized by small cell and genome sizes [52], containing only essential genes and lacking many metabolic functions like amino acid or nucleotide biosynthesis [44]. Due to these special characteristics, a parasitic lifestyle has been suggested [8,53]. Two MAGs from the Synergistales order were found in the BA-1 sample from Miranga Oil Field. One of them was affiliated with the Thermovirgaceae family. Members of this family were found in a hydrocarbon reservoir that was submitted to a CO 2 enhanced oil recovery flood 40 years ago [54]. In terms of the archaeal community, this oil reservoir metagenome was characterized by the predominant presence of members of Halobacterota (8) phylum, followed by Thermoplasmota (2) and Euryarchaeota (1). MAGs from Thermoplasmota phylum could not be affiliated to higher taxonomic ranks. MAG2 was affiliated with the Methanobacter family. Members of this family are known to reduce H 2 and CO 2 to produce methane [55]. In a study across 22 oil reservoirs, Methanobacteria were found in all of them [56]. Zhao et al. [57] analyzed eight oil reservoirs from northern China and found that the core microbiota was composed of three bacterial and eight archaeal genera, including the genus Methanobacterium from the Methanobacteriaceae family [57]. In this study, one MAG from the family Methanoregulaceae was recovered and affiliated to the species Methanolinea tarda. Members of Methanolinea are hydrogenotrophic archaea [58] and are also found as part of the core in some studies [56,57,59]. Four MAGs were affiliated to the Methanotrichaceae family, two of them belonging to the Methanothrix genus. Finally, three MAGs were assigned to Halobacterota phylum, without any classification at higher ranks. However, they were placed into the Methanotrichales order clade (Figure 3 and Supplementary Table S4). The family Methanotrichaceae was proposed as a new family to replace Methanosaetaceae. Methanothrix is the only genus in this family, with three species, M. harundinacea, M. thermoacetophila, and M. soehgenii [60]. This genus produces methane mainly from acetate, but it is also able to reduce CO 2 via direct interspecies electron transfer (DIET) with Geobacter spp. [61,62]. Members of Methanothrix are more frequently dominant in reservoirs from medium to high-temperature reservoirs [56].
In the second group of samples (water injection plus high temperature), only the dataset from the Qinghai Oil Field (QH) was included. This oil field has been submitted to water-flooding for 15 years. Binning using this metagenome yielded four archaeal MAGs (Supplementary Table S4), which were assigned to Halobacterota (Methermicoccus shengliensis), Euryarchaeota (Methanothermococcus thermolithotrophicus and Methanobacteriaceae member), and Nanoarchaeota (member of Woesearchaeales order). The last one is an ultra-small hyperthermophilic obligate ectosymbiont with a reduced genome that lives on the surface of several archaeal hosts of the Crenarchaeota phylum [63]. Members of Nanoarchaeota phylum have already been found in hot springs in Yellowstone National Park (United States) under high temperatures (~90 • C) [64]. They are also associated with volcanoes rich in hydrocarbons [64], deep-sea hydrothermal vents, and deep lakes [63][64][65][66][67][68][69][70][71][72][73]. Methermicoccus shengliensis is a methylotrophic methanogen. It has been isolated from production water samples from a high-temperature oil reservoir in Japan [74] and from Shengli Oil Field, where the in-situ temperatures range from 75 to 80 • C [75]. Methermicoccus shengliensis is a hydrogenotrophic and methylotrophic methanogen [56]. Gao and colleagues investigated the spatial distribution of microbial communities and their drivers in 20 water-flooded oil reservoirs and two that had not been flooded, in China, and observed that Methanothermococcus was found more frequently in reservoirs with high salinity [56]. A study in the Xinjiang Luliang long-term water-flooded oil reservoir detected Methanothermococcus thermolithotrophicus by mcrA sequencing [76]. With regards to the Bacteria Domain, 12 MAGs were recovered from the QH dataset, which belonged to eight phyla, as follows: Campylobacterota (1), Cloacimonadota (1), Deferribacterota (2), Desulfuromonadota (1), Firmicutes (3), Patescibacteria (1), Proteobacteria (2), and Synergistota (1). Two genomes of Flexistipes (Deferribacterota phylum) were found. There is evidence that species from this genus, as F. sinusarabici (MAG94), have genes that encode for electrically conductive pili (e-pili), similar to Geobacter species, that is responsible for the DIET process [77]. Geoalkalibacter (MAG146), belonging to Proteobacteria, is a strictly anaerobic bacterium, able to reduce Fe(III). A strain of this genus was isolated from a produced water in the Redwash oil field in Utah, USA, with 52 • C of in situ temperature [78]. This genus was also found in a reservoir that had not been water-flooded [56]. Members of the genera Pseudomonas, Marinobacter, Sulfurospirillum, among others, are universally detected in reservoirs with a wide range of temperature, and most of them are hydrocarbon-degraders, surfactant producers, and nitrate or sulfate reducers [56]. Many of these microorganisms are aerobic. Due to the low redox potential in the oil reservoirs, anaerobic and facultative microorganisms are abundant. However, a lot of aerobic microbes are also detected in this environment, and water injection may be closely related to that phenomenon [76].
The third group of samples (seawater injection plus high temperature) were clustered Kuparuk (K2 and K3) and Ivishak formations (I1 and I2). The binning process of these four metagenomes allowed the recovery of 24 MAGs. Some different physicochemical characteristics were found among these samples; first, the temperatures in the Ivishak formation were between 80-83 • C and in the Kuparuk were from 47 to 70 • C, further, the well I2 has the longest history of seawater injection among all samples, and that is reflected by the presence of the Desulfonauticus spp. (Desulfobacterota phylum), sulfatereducing bacteria present in the deep sea [79,80]. On the other hand, the I1 well has only recently been injected with seawater. Only two MAGs were recovered from the I1 sample. One of them was assigned to the thermophilic fermenting anaerobe Thermoanaerobacter pseudethanolicus (Firmicutes phylum), which has been isolated from oil reservoirs. This species can respire thiosulfate contributing to souring [81] and, depending on the ecological conditions, may act in syntrophy with acetoclastic methanogens [82]. The second MAG was assigned to Halomonas, Gammaproteobacteria class, a known hydrocarbon degrader [83]. MAGs found in the Kuparuk wells were different probably because the temperatures were lower, and the use of seawater injection is more recent than in the other site. These were classified as Methermicoccus shengliensis and the hyperthermophilic sulfide producing Archaeoglobus fulgidus [59,79], belonging to the Halobacterota phylum, and as the CO 2 /H 2reducing methanogen Methanothermobacter thermautotrophicus and the sulfide-producing Thermococcus sibiricus, members of the Euryarchaeota phylum. The last two are frequently found in oil thermophilic oil reservoirs [54,56,57,76,79]. Seven genomes belonging to Firmicutes phylum were recovered: Thermoanaerobacter pseudethanolicus (2), Caldanaerobacter subterraneus (1), Thermoacetogeniales (1), Ammonifexales (1), and Moorellia (2), which are known to be syntrophic and fermentative bacteria [75,[84][85][86]. The family Thermotogaceae was represented with three genomes, belonging to two different thermophilic genera, Thermotoga and Pseudothermotoga. These microbes are indigenous to petroleum reservoirs and more commonly abundant in non-water flooded oil fields [87]. They are fermentative bacteria and have been detected in several high-temperature oil reservoirs [75,81,88,89]. Lastly, one MAG affiliated to Thermodesulfobacterium commune, Desulfobacterota phylum, was found. This bacterium can incompletely oxidize fatty acids to produce acetate [90].
The fourth group of samples (water injection plus mesophilic temperatures) were clustered Liaohe, Jiangsu, Shengli, and Daqing Oil Field datasets. In this cluster, 52 MAGs were recovered, of which only ten belonged to the Archaea Domain. As expected, this group of wells was characterized by the presence of many Proteobacteria members, especially the well-known hydrocarbon-degraders and surfactant producers like the facultative anaerobe organotrophs Pseudomonas (P. stutzeri, P. aureginosa, P. balearica), Acinetobacter, Marinobacterium, and Tepidimonas, and the fermenters and nitrate reducers Azovibrio and Thauera. In many investigations, these bacteria have been detected in most of the oil reservoirs studied [56] and some authors have proposed these genera as part of the core microbiota of oil reservoirs [57,91,92]. However, they can be more abundant in water-flooded reservoirs [86]. Two MAGs belonging to Firmicutes phylum were recovered, Lactobacillus ozensis and Anoxybacillus gonensis. species of the fermentative genus Lactobacillus have already been detected in oil samples, although its role in oil reservoirs remains unknown [93]. A member of the Anoxybacillus genus was isolated from production water of a high-temperature oil reservoir and showed the ability to reduce nitrate and to control sulfate reducers [94]. One MAG assigned to the thermophilic sulfate-reducing genus Thermodesulfovibrio (Nitrospira phylum) was reconstructed. Members of this genus were previously detected in oil field environments [95]. Archaeal MAGs were mainly assigned to Methanotrichaceae and Archaeogloboceae Families and one MAG was assigned to the genus Methanolobus, a methylotrophic methanogen belonging to the Methanosarcinaceae family [96,97]. This methanogen was predominant in low and medium-temperature oil reservoirs [56].
Finally, in the fifth group of samples (recycled water injection), SB1 and SB2 from the Schrader formation were included. Assembly enabled the recovery of 40 MAGs, of which seven were assigned to the Archaea domain, represented by 5 families, Methanobacteriaceae, Methanocorpusculaceae, Methanocullaceae, Methanomethylophilaceae, and Methanotrichaceae. The hydrogenotrophic and mesophilic Methanocalculus archaeal genus was found in reservoirs without anthropogenic perturbation and were dominant in medium to high-temperature reservoirs [56]. Methanoculleus is a methanogen able to reduce CO 2 in syntrophy with bacteria such as Syntrophus and Marinobacter [79]. This genus was proposed as part of the core microbiota across several oil reservoirs in China [57] and was found as dominant in low to medium-temperature reservoirs [56]. A study performed by Liang et al. [98] proposed cooperation between Methanoculleus and Anaerolineaceae members for the n-alkanes degradation. Three MAGs affiliated to the Anaerolinaceae family were found in these samples, and other MAGs belonging to the Kosmotogaceae family and one to the Petrotogaceae family, both from the Thermotogota phylum, were recovered. Members from these two last families have been found in oil field environments [98,99]. Lastly, three MAGs classified as sulfate-reducing bacteria from the Desulfotomaculales order were found. Desulfotomaculum has already been proposed as hydrocarbon-degrading bacteria because of high similarity with gene sequences coding for benzyl succinate synthase (bssA), responsible for the activation of alkyl-substituted aromatic hydrocarbons [59].

Metabolic Potential
Similar to the taxonomic analysis, annotation of the potential metabolism of MAGs revealed the presence of a microbial functional core across all oil fields and the occurrence of exclusive (or site-specific) functions, depending on the recovery management and in situ temperatures ( Figure 4, Supplementary Table S2). Without any anthropogenic intervention, the indigenous microbial communities in oil reservoirs are initially dominated by slowgrowing anaerobes such as methanogens and some Clostridia members that are adapted to living on a high concentration of hydrocarbons and high temperatures, among other extreme conditions [13]. This was observed in the first group, represented by the sample from the Miranga Oil Field, that was mainly governed by methanogenic archaea, and as consequence, the potential metabolism prevailing was mainly methanogenesis-related ( Figure 4, taxa names in purple/modules 12-16). In the second group, which originated from high-temperature water flooded oil reservoir (Figure 4, taxa names in dark green), it was possible to observe diversification in both taxonomic and metabolic levels. In addition to the methanogenesis metabolism (modules 12-16), alkane-degradation (module 4) and dissimilatory nitrate reduction (module 17) were also present. Depending on the water source, the injection can deliver oxidants and different electron acceptors, thus promoting fast-growing microbes as members from Deferribacterota, Proteobacteria, and Bacteroidota phyla, with thermodynamically more favorable metabolism as nitrate or sulfate reduction [13]. In the third cluster, comprising Kuparuk and Ivishak Oil Fields, seawater was mainly used for secondary recovery. The use of seawater can introduce sulfate to the reservoirs, promoting sulfate reduction in the community and the consequent production of H 2 S [100]. Desulfonauticus sp. was probably introduced in the oil reservoir by seawater, and the functional analysis of this MAG showed the capability to reduce sulfate with the respective module almost complete (Figure 4, taxa names green/module 18). In this comparative study, the fourth group can be considered as the "greatest level of intervention" amongst all. These were mesophilic water-flooded oil fields. Here, a great shift in the functional profile of the microbial community was observed (Figure 4, taxa names red, light green, and brown). Modules of hydrocarbon aerobic degradation (Figure 4, 3-8) are found in several MAGs. This was expected since samples were dominated by the fast-growing facultative anaerobes and opportunistic members of Marinobacter, Marinobacterium, and other well-known hydrocarbon degraders as Pseudomonas and Tepidimonas [3,13,56]. Especially in the Liaohe and Jiangsu oil fields, many genomes with metabolic potential to reduce nitrate and sulfate were recovered. This is in accordance with the fact that recycled treated water was used for secondary recovery in the Schrader Oil Field. From all MAGs obtained in this study, only a few from this group showed metabolic potential to degrade aromatic hydrocarbons anaerobically. These MAGs were associated with Desulfotomaculales and Syntrophorhabdaceae. As it was discussed above, members of the Desulfotomaculum had already been associated with fumarate activation mechanism in alkyl-substituted aromatic hydrocarbons as toluene and ethylbenzene [59]. The family Syntrophorhabdaceae embraces the single mesophilic genus Syntrophorhabdus, which is capable of oxidizing phenol p-cresol, 4-hydroxybenzoate, isophthalate, and benzoate in association with an H 2 -scavenging part-ner (e.g., Methanospirillum hungatei or Desulfovibrio sp.) [101,102]. Here, MAGs belonging to Syntrophorhabdaceae and Desulfotomaculales showed several genes from the module of anaerobic Benzoyl-CoA degradation (module 11), a universal intermediate formed during degradation of aromatic compounds [103].   The metabolic analysis carried out in this study also provided evidence that anaerobic hydrocarbon degradation via fumarate addition (module 10) is a rare metabolism detected in oil reservoirs. The latter results are expected once mechanisms alternative to the archetypical fumarate addition reactions (e.g., anaerobic hydrocarbon carboxylation or hydroxylation) are present in oil reservoirs [3,104], which are not described in any KEGG category or module yet and therefore could not be detected in this work. However, genes for further activated hydrocarbon transformations were frequently detected, such as module 9, phenol anaerobic degradation, which was found in 61 MAGs out of 148, suggesting that the organisms represented by these MAGs mediate the downstream degradation of aromatic compounds via anaerobic benzoate degradation. On the other hand, the potential for aerobic hydrocarbon degradation was showed to occur in oil wells submitted to water flooded and in oil reservoirs with mesophilic temperatures. Last but not least, catechol meta-cleavage and all methanogenesis modules were found across all oil fields, suggesting that these metabolisms can be ubiquitous in oil reservoirs.
The module of catechol meta-cleavage degradation was found in 52% of the MAGs (78 MAGs). Even with module completeness of around 10%, it was a relevant result as this metabolic pathway is aerobic and was found in all MAGs belonging to Archaea. A bar plot with the frequency of each gene from catechol meta-cleavage and its phylum affiliation was constructed ( Figure 5). The gene praC, which encodes the 4-oxalocrotonate tautomerase (4-OT), showed a higher frequency and was present in MAGs affiliated to 14 phyla, mainly in Proteobacteria, Halobacterota and, Euryarchaeota ( Figure 5). This gene was found in 15 archaeal MAGs, which were submitted to the PATRIC platform [105] to search for evidence if it is part of an operon. As it can be observed in the MAG comparison, this gene was not found in an operon region (Supplementary Figure S3). In three MAGs, this gene was annotated as a 4-oxalocrotonate tautomerase-like (4-OT) enzyme (MAG17, MAG103, and MAG106) (Supplementary Table S4). In MAG103 ( Figure S3a, Supplementary  Table S4) and MAG17 ( Figure S3b, Supplementary Table S4), assigned to the Archaeoglobus genus, the upstream genes found encode unknown proteins, while the downstream genes encode chorismate synthase and Acyl-CoA dehydrogenase. In MAG106 (Figure S3c), assigned to Methanoculleus marisnigri, a gene that encodes a haloacid dehalogenase-like hydrolase was found upstream of the praC gene. The 4-OT enzyme is part of a group known as promiscuous enzymes since they have multiple low-level catalytic activities in addition to their primary physiological function [106,107]. It is already known that many organisms through Bacteria and Archaea domains harbor 4-OT enzyme homologs which have unidentified physiological functions. Many of these organisms are not aromatic hydrocarbon degraders, suggesting that these homologs might have other functions [106].
Several recent studies of the methyl-coenzyme M reductase complex (Mcr) in MAGs have shown that divergent mcr -like genes are involved in methane/alkane metabolism. Here, we recovered the mcrA genes from MAGs assigned to archaea for further phylogenetic analysis. A phylogenetic tree was constructed using mcrA sequences recovered from 11 MAGs and approximately 280 curated mcrA sequences recovered from PhyMet 2 [108]. Interestingly, two MAGs (MAG 47 and MAG 59) (Supplementary Table S4) each contained two different mcrA sequences, and two archaeoglobaceae-assigned MAGs contained mcrA sequences. In addition to the mcrA sequences found, four mcr-like sequences belonging to NM1 (Candidatus Methanoliparia) and NM3 new lineages were included [109]. According to the phylogenetic analysis ( Figure 6), MAG 16 assigned to Methanothrix clustered with NM3, Methanopyrales, and Methanomassiliicoccales. NM3 is suggested to perform methyldependent hydrogenotrophic methanogenesis with the potential of using methanol and methanethiol [109]. The two MAGs assigned to Archaeoglobacea were related to NM1 lineage. NM1 belongs to Ca. Methanoliparia has been reported to be capable of both methane and short-chain alkane metabolisms [109,110]. These results indicate that archaeal MAGs identified in this work could also be involved with hydrocarbon degradation. genase-like hydrolase was found upstream of the praC gene. The 4-OT enzyme is part of a group known as promiscuous enzymes since they have multiple low-level catalytic activities in addition to their primary physiological function [106,107]. It is already known that many organisms through Bacteria and Archaea domains harbor 4-OT enzyme homologs which have unidentified physiological functions. Many of these organisms are not aromatic hydrocarbon degraders, suggesting that these homologs might have other functions [106]. Here, we recovered the mcrA genes from MAGs assigned to archaea for further phylogenetic analysis. A phylogenetic tree was constructed using mcrA sequences recovered from 11 MAGs and approximately 280 curated mcrA sequences recovered from PhyMet 2 [108].  Table S4) each contained two different mcrA sequences, and two archaeoglobaceae-assigned MAGs contained mcrA sequences. In addition to the mcrA sequences found, four mcr-like sequences belonging to NM1 (Candidatus Methanoliparia) and NM3 new lineages were included [109].
According to the phylogenetic analysis ( Figure 6), MAG 16 assigned to Methanothrix clustered with NM3, Methanopyrales, and Methanomassiliicoccales. NM3 is suggested to perform methyl-dependent hydrogenotrophic methanogenesis with the potential of using methanol and methanethiol [109]. The two MAGs assigned to Archaeoglobacea were related to NM1 lineage. NM1 belongs to Ca. Methanoliparia has been reported to be capable of both methane and short-chain alkane metabolisms [109,110]. These results indicate that archaeal MAGs identified in this work could also be involved with hydrocarbon degradation.

Functional Core Analysis
To investigate if there is a potential functional core shared by all oil reservoirs under analysis, KEGG annotations of the MAGs from each oil field were pooled and the KEGG orthologies common to all were assessed. At the same time, the exclusive functions of each oil field were also evaluated. A total of 25,563 ORFs were annotated, distributed in 5332 different KEGG orthologies. Functions that were present in at least 8 out of 9 oil fields were called the potential functional core (1690 genes), which represented 31.7% of total

Functional Core Analysis
To investigate if there is a potential functional core shared by all oil reservoirs under analysis, KEGG annotations of the MAGs from each oil field were pooled and the KEGG orthologies common to all were assessed. At the same time, the exclusive functions of each oil field were also evaluated. A total of 25,563 ORFs were annotated, distributed in 5332 different KEGG orthologies. Functions that were present in at least 8 out of 9 oil fields were called the potential functional core (1690 genes), which represented 31.7% of total functional annotation (Figure 7a). Site-specific genes (orthologies that were found in one oil field) accounted for 18.8% (1007 KEGG functions) of total annotated KEGG orthologies (Figure 7a). The percentage of specific functions of each site was calculated (Figure 7a). Liaohe Oil Field showed the most distinct metabolic potential compared to the "functional core", comprising 41.9% of exclusive functions, followed by Daqing Oil Field with 17.37%, and Kuparuk and Jiangsu with 10.62% and 8.64% of exclusive genes, respectively. Shengli and Miranga Oil Fields had the lowest percentage of exclusive functional potential and thus showed the most similar potential metabolism compared to the "core", with 1.39% and 1.09% of the exclusive KEGG orthologies, respectively. These results suggest a possible correlation between the anthropogenic intervention level and the percentage of exclusive functions. For example, Miranga was the oil field without any intervention, and it had the lower percentage of exclusive functions, while Liaohe with the highest level of perturbations (water injection and mesophilic temperatures) had the highest percentage of exclusive genes (more different from the core).
The main metabolisms encompassed by the potential functional core were assessed (Figure 7b). The functional core comprised almost all metabolic categories. The most abundant core metabolisms were carbohydrate and energy metabolisms, the latter including methane, nitrogen, and sulfur metabolisms. All methanogenesis pathways were found in the functional core, with hydrogenotrophic methanogenesis as the most represented one. In addition, assimilatory sulfate reduction and nitrogen fixation were also part of the functional core. Xenobiotics biodegradation and the metabolism was less represented in the functional core, with five different functions, four from benzoate degradation (genes praC, pcaC, mhpE, and bsdB) and one from nitrotoluene degradation (gene hyaBC). Genes mhpE and praC belong to the catechol meta-cleavage pathway, and as was discussed above, gene praC encodes to a 4-OT, a promiscuous enzyme. Genes pcaC and bsdB are also part of aerobic metabolism. These results suggest that aerobic hydrocarbon degradation metabolism is likely an important potential energy strategy for the microbial community in oil fields. However, methanogenesis seems to be the most relevant metabolism in this extreme environment, being omnipresent independently of the good management practice employed.
Within the exclusive orthologies of each oil field, Ivishak, Schrader, and Shengli Oil Fields showed functional specialization in xenobiotics biodegradation (Figure 7c). For example, in the Shengli Oil Field, more than 50% of specific genes were from the xenobiotics degradation category, and these genes were different from the ones occurring in the other oil fields. These results suggest that in each site microorganisms are specialized in different biodegradation pathways.
On the other hand, genes for sulfur-oxidizing (sox genes) were exclusive in the Liaohe Oil Field. This is consistent with the use of nitrate injection in this reservoir, a promising oil souring control strategy that promotes sulfur oxidation [13,111]. Despite the beneficial role, intermediates from sulfide oxidation are potentially corrosive [112][113][114]. degradation category, and these genes were different from the ones occurring in the other oil fields. These results suggest that in each site microorganisms are specialized in different biodegradation pathways.
On the other hand, genes for sulfur-oxidizing (sox genes) were exclusive in the Liaohe Oil Field. This is consistent with the use of nitrate injection in this reservoir, a promising oil souring control strategy that promotes sulfur oxidation [13,111]. Despite the beneficial role, intermediates from sulfide oxidation are potentially corrosive [112][113][114].

Environment-Specific Orthogroups
To explore the functional novelty in the microbiomes of oil reservoirs, a taxonomyinformed approach was used by comparing 67 out of 148 MAGs obtained to related genomes at the family level and identifying orthogroups (cluster of orthologous genes) that are exclusive to the oil reservoir genomes at the family level. Among the 46 families that were analyzed, no unique orthogroups (out of 58,769) were found in the oil field datasets. Due to this result, tspex specificity measure (SPM) output was used to identify enriched orthogroups in the MAGs obtained in this study compared to the representative genomes from GTDB. No orthogroups with at least 60% of enrichment were found (Supplementary  Table S5). With the thresholds and conditions of this analysis, no evidence of environment-specific or gene enrichment at the family level was found amongst the microbiomes of oil fields under study, possibly because the families of these environments are already highly adapted to these environments. Maybe at higher taxonomic levels, it would be possible to find exclusive orthogroups related to oil fields.

Environment-Specific Orthogroups
To explore the functional novelty in the microbiomes of oil reservoirs, a taxonomyinformed approach was used by comparing 67 out of 148 MAGs obtained to related genomes at the family level and identifying orthogroups (cluster of orthologous genes) that are exclusive to the oil reservoir genomes at the family level. Among the 46 families that were analyzed, no unique orthogroups (out of 58,769) were found in the oil field datasets. Due to this result, tspex specificity measure (SPM) output was used to identify enriched orthogroups in the MAGs obtained in this study compared to the representative genomes from GTDB. No orthogroups with at least 60% of enrichment were found (Supplementary  Table S5). With the thresholds and conditions of this analysis, no evidence of environmentspecific or gene enrichment at the family level was found amongst the microbiomes of oil fields under study, possibly because the families of these environments are already highly adapted to these environments. Maybe at higher taxonomic levels, it would be possible to find exclusive orthogroups related to oil fields.

Conclusions
Oil reservoirs are very complex niches that can be disturbed by intervention practices for oil recovery (water injection, use of biocides, etc.). Herein, metagenomic information was retrieved from geographical distinct oil reservoirs that corresponded to petroleum fields under different management practices and in situ temperatures. Despite these differences, analysis of metagenomes at the genome level through MAG recovery indicated a core of functions shared by microorganisms in all oil reservoirs under study. The functional core comprised not only basic cellular functions, related to energy and replication, but also methanogenesis pathways and some genes related to aerobic hydrocarbon degradation. On the other hand, hydrocarbon anaerobic degradation via fumarate addition was not relevant. An in-depth analysis of site-specific functions (orthologies that were found only in one oil field) revealed functional specialization in xenobiotics biodegradation in each oil field. Such metabolic specialization is likely driven by the selective pressures imposed by the distinct anthropogenic intervention practices and in situ temperatures. Lastly, the analysis of gene orthologs at the family level did not indicate exclusive or environment-specific genes in petroleum-associated MAGs compared to other environments. Understanding the indigenous and "disturbed" microbial community profiles and their functional potential is essential for the evaluation of the efficiency and consequences of each intervention practice applied to oil reservoirs. Further, knowledge on the taxonomic and metabolic shifts in the oil field microbiome related to anthropogenic intervention is important to design future rationale and efficient oil recovery and toxic pollutant mitigation measures.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/microorganisms9091812/s1, Figure S1: Pairwise distances between the metagenomic datasets. Figure S2: Quality characteristics across low (n = 28), medium (n = 74) and high quality (n = 74) MAGs. (a) Distribution across completeness, (b) Distribution across contamination, (c) Quality dispersion, yellow dot line indicates the cut-off for medium-quality MAGs; green dot line indicates the cut-off for high-quality MAGs. The colors of the dot represent the oil reservoir and respective geographic location where the MAGs come from. MAGs at right and above the yellow and green dot lines were used for the downstream analyses. Figure S3: Schematic representation of the sequence annotation of the 10 Kb around the praC gene in three MAGs obtained in this study in comparison with closely related public genomes. (a) MAG17, (b) MAG103, (c) MAG106. Red arrows represent the praC gene. Table S1: Metagenome datasets information. Table S2: List of specific genes according to KEGG database. Table S3: Assemblies statistics. Table S4: Complete list of MAGs obtained. Table S5: Edited Tspex comparison matrix. It was calculated the differences between SPM from oil fields' MAGs and from families of GTDB. Negative values mean no enrichment. Positive values mean enrichment of the orthogroups in each family from oil field MAGs compared with families genomes from the GTDB. Value 1 completely enrichment orthogroup.